where did we get the data and how about the datasets, what it is doing and what we did with it conclusions And then final conclusions max 400 words
# Full Report
This report uses data from an off-premise liquor store in a small up-market shopping centre called Saporium in Rosebery, NSW Australia.
Rosebery has a population of approximately 10,000 with a median age of 33 and 2,500 families. The median weekly income is $1,900 and weekly rent is $580. 34% of residents have a university or tertiary education which is twice that of NSW average and 67% work full-time. 58% of the population live in flats or apartments. The suburb has a diverse mix of cultures with Chinese and Greek ancestory heavily over-represented compared to the NSW Average. 63% of the population’s parents were both born overseas, just under double the NSW average. Eastern orthodox over-represented and Anglican under-represented.(ABS, 2016)
This report looks at the effect of weather on alcohol sales in the hope of improving the likelihood of small business success. Lots of small businesses fail due to incorrect product placement, marketing or pricing. A business’s core role is to provide goods and services. Their success is dependent on their ability to sell products for revenues. It is therefore imperative that they 1. stock the correct product 2. Market the correct product 3. Price their products effectively In the past, these decisions have been the intuition of a good business operator however with the accessibility of data and analytics tools we can now take a more scientific approach which can be replicated by any business.
There is a wealth of evidence that shows that weather influences consumer behaviour and understanding this leads to better marketing decisions (Murray et al, 2010). The key areas of consumer behaviour this report will address are what conditions weather effects consumer spending and what conditions effect consumer behaviour.
Weather is defined as ______. Researchers indicate that weather alters the shoppers mood.
Research indicates that weather can effect consumer spending in three ways 1. Bad weather keeps people at home reducing foot traffic and sales 2. Weather can influence store traffic and sales volume, and 3. Weather can influence sales by affecting the customers internal state.
We will use this report to show small businesses how they can use their data to increase profitability by making informed decisions about stocking, marketing and pricing their products, helping them find the signal in the noise.
# LOAD DATA
library(ggplot2)
library(tidyr)
library(xtable)
library(knitr)
data = read.csv("data/ProcessedData.csv")
# Quick look at top 6 rows of data
head(data)
## X Date Receipt.Number Total
## 1 0 2018-08-06 18179 27.00
## 2 4 2018-08-06 18178 24.00
## 3 7 2018-08-06 18177 47.00
## 4 11 2018-08-06 18176 27.00
## 5 16 2018-08-06 18175 31.05
## 6 21 2018-08-06 18174 33.99
## Details
## 1 1 X Cirillo Rose
## 2 2 X Fever Tree Elderflower Tonic 4pk
## 3 2 X Ps40 Smoked Lemonade + 1 X Athletes of Wine Vino Athletico Macedon Pinot noir
## 4 1 X Empty Wine Bottle 750ml + 1 X Unico Zelo Harvest Sauvignon Blanc KEG + -1 X Discount
## 5 3 X Frenchies Kolsch 330ml + 3 X Frenchies Comet Pale Ale 330ml + -1 X Discount
## 6 1 X Domaine Thomson - Explorer Pinot Noir
## Time Maximum.temperature..Degree.C. Rainfall.amount..millimetres.
## 1 18:42:40 19 0
## 2 18:03:54 19 0
## 3 17:45:58 19 0
## 4 17:32:56 19 0
## 5 16:26:31 19 0
## 6 15:05:09 19 0
## Size of data
dim(data)
## [1] 11870 8
## R's classification of data
class(data)
## [1] "data.frame"
## R's classification of variables
str(data)
## 'data.frame': 11870 obs. of 8 variables:
## $ X : int 0 4 7 11 16 21 24 34 37 40 ...
## $ Date : Factor w/ 404 levels "2017-06-24","2017-06-25",..: 404 404 404 404 404 404 404 404 404 404 ...
## $ Receipt.Number : Factor w/ 11870 levels "10000","10001",..: 8108 8107 8106 8105 8104 8103 8102 8101 8100 8079 ...
## $ Total : num 27 24 47 27 31.1 ...
## $ Details : Factor w/ 6558 levels "-1 X Adelaide Hills Distillery Dry Vermouth",..: 1075 5396 5594 1609 5787 1435 2251 1095 4873 3352 ...
## $ Time : Factor w/ 9974 levels "00:28:37","01:26:38",..: 8960 8071 7704 7424 6144 4551 3576 3415 3006 2451 ...
## $ Maximum.temperature..Degree.C.: num 19 19 19 19 19 19 19 19 19 19 ...
## $ Rainfall.amount..millimetres. : num 0 0 0 0 0 0 0 0 0 0 ...
#sapply(mtcars, class)
The dataset contains 11870 sales where each row represents 1 sale. Summary:
Complexity of data: We are looking at a dataset with len(names())
How does the maximum temperature affect the consumer decision when purcahsing alcohol?
Insert text and analysis.
Lets start at looking at how much a person spends in the store on average. The number we see inside each bar represents the number of transactions for that temperature category and the line in each bar represent the median of money spent. As we can see from the chart, people seem to spend more money in milder temperature (15 - 40 degrees) but in more extreme temperatures (10 - 15 and 40 - 45 degrees) people tend to buy less alcohol. We can also observe that people tend to spend more money when the temperature is between 35 and 40 degrees.
#The temperature is a quantitative variable. We start by changing it to a qualitative one using ranges that cover 5 degrees Celcius
temp = data$Maximum.temperature..Degree.C.
data$tempGroups = cut(temp, c(10,15,20,25,30,35,40,45))
#Take a look at overall data before looking at the graphs
heatData = data %>% drop_na(Maximum.temperature..Degree.C.)
dataFrame <- data.frame(Rows = c(nrow(data)-nrow(heatData)),
Max = c(max(temp)),
Min = c(min(temp)),
Mean = c(mean(temp)),
Median = c(median(temp)))
kable(dataFrame, caption = "Fuck yeah", col.names = c("Missing rows", "Max heat", "Min heat", "Mean maximum heat", "Median of maximum heat"))
| Missing rows | Max heat | Min heat | Mean maximum heat | Median of maximum heat |
|---|---|---|---|---|
| 0 | 43.4 | 14.3 | 23.28904 | 23.2 |
#Transaction sizes for each temperature range
meanPerPerson = aggregate(data$Total ~ data$tempGroups, data, mean)
medPerPerson = aggregate(data$Total ~ data$tempGroups, data, median)
transactions = merge(x = meanPerPerson, y = medPerPerson, by='data$tempGroups')
names(transactions) = c('Temperature', 'Mean_total', 'Median_total')
Fre <- as.data.frame(table(data$tempGroups))
colnames(Fre)[1] <- "tempGroups"
Fre$lab <- as.character(Fre$Freq)
Fre
## tempGroups Freq lab
## 1 (10,15] 68 68
## 2 (15,20] 3720 3720
## 3 (20,25] 3599 3599
## 4 (25,30] 3560 3560
## 5 (30,35] 651 651
## 6 (35,40] 246 246
## 7 (40,45] 26 26
#Barplot for average money spent with median lines
ggplot(transactions, aes(Temperature, Mean_total), label = Fre$Freq) + geom_bar(stat="identity", position = "dodge", fill = "#FF6666") + ggtitle("Average money spent (in dollars) per purchase for different temperature") + ylab("Dollars") + theme_bw() + theme(plot.title = element_text(hjust = 0.5)) + geom_errorbar(data=transactions, aes(Temperature, ymax = Median_total, ymin = Median_total), size=1, linetype = "solid", inherit.aes = F, width = 0.9) + geom_text(aes(label = Fre$Freq), position = position_dodge(width = 0.9), vjust = 1.5) + scale_x_discrete(labels = c('10 - 15','15 - 20','20 - 25', '25 - 30', '30 - 35', '35 - 40', '40 - 45'))
#Barplot showing only median
#ggplot(transactions, aes(Temperature, Median_total)) + geom_bar(stat="identity", position = "dodge", fill = "#56B4E9") + ggtitle("Median how much each person spends (in dollars) for different temperature") + ylab("Median") + theme_bw() + theme(plot.title = element_text(hjust = 0.5))
#ggplot(data, aes(Maximum.temperature..Degree.C., median(Total)), group = 1) + geom_boxplot() + coord_flip()
#Number of transactions per temperature range
#barplot(table(data$tempGroups))
Lets look at what an average day looks like in total sales for each temperature range. This seems to back up what we where saying before. That in extreme temperature (10 - 15 and 40 - 45 degrees) people seem to buy less alcohol while in milder temperatures people seem to buy alot. We also see the spike again when the temperature is between 35 - 40 degrees.
#Total money spent for each temperature range
totalPerDay = aggregate(data$Total ~ data$tempGroups, data, sum)
nrOfDaysPerTemp = aggregate(data$Date ~ data$tempGroups, data, function(x) length(unique(x)))
totals = merge(x = totalPerDay, y = nrOfDaysPerTemp, by='data$tempGroups')
names(totals) = c('Temperature', 'Total', 'NrOfDays')
totals['meanPerDay'] = round(totals$Total / totals$NrOfDays, 1)
totals
## Temperature Total NrOfDays meanPerDay
## 1 (10,15] 2973.69 4 743.4
## 2 (15,20] 201000.54 116 1732.8
## 3 (20,25] 200114.10 136 1471.4
## 4 (25,30] 192437.13 122 1577.4
## 5 (30,35] 32659.84 18 1814.4
## 6 (35,40] 15485.55 7 2212.2
## 7 (40,45] 854.10 1 854.1
ggplot(totals, aes(Temperature, meanPerDay)) + geom_bar(stat="identity", position = "dodge") + geom_bar(stat="identity", position = "dodge", fill = "#56B4E9") + ggtitle("Average money spent (dollars) in one day for different temperature") + ylab("Dollars") + theme_bw() + theme(plot.title = element_text(hjust = 0.5)) + scale_x_discrete(labels = c('10 - 15','15 - 20','20 - 25', '25 - 30', '30 - 35', '35 - 40', '40 - 45'))
Summary: Looking at the values of the median and mean purchase transactions we see that there is not much change in consumer behaviour over the temperature ranges 15-35 degrees. However the more extreme temperature values have more of an effect. During the coldest times (10-15 degrees) there is a definate drop in amount of money spent per purchase. During the very hottest periods 40-45 degrees there is also a massive drop in amount spent on each transaction. However it is worth noting that there were very few transactions during that time. Another very interesting spike in sales occured at the 35-40 temperature range. This could be because people drink more alcohol, however these would be regular temperatures during Christamas time when people are on vacation and drink more alcohol in general.
How does rainfall affect the consumer decision when purchasing alcohol?
Insert text and analysis.
There are a few days for which we do not have rain data so we start by removing those rows. We then take a better look at the rain data Let’s start by taking a better look at the rainfall data
rainData = data %>% drop_na(Rainfall.amount..millimetres.)
rain = rainData$Rainfall.amount..millimetres.
dataFrame <- data.frame(Rows = c(nrow(data)-nrow(rainData)),
Max = c(max(rain)),
Min = c(min(rain)),
Mean = c(mean(rain)),
Median = c(median(rain)))
kable(dataFrame, caption = "Fuck yeah", col.names = c("Missing rows", "Max rainfall", "Min rainfall", "Mean rainfall", "Median of rainfall"))
| Missing rows | Max rainfall | Min rainfall | Mean rainfall | Median of rainfall |
|---|---|---|---|---|
| 74 | 69.4 | 0 | 1.665683 | 0 |
#We start by changing the rainfall from a quantitative variable to a qualitative one
rainData$rainGroups = cut(rain, c(0,0.1,15,70), include.lowest = TRUE)
#Transaction sizes for each temperature range
meanPerPerson = aggregate(rainData$Total ~ rainData$rainGroups, rainData, mean)
medPerPerson = aggregate(rainData$Total ~ rainData$rainGroups, data, median)
transactions = merge(x = meanPerPerson, y = medPerPerson, by='rainData$rainGroups')
names(transactions) = c('Rainfall', 'Mean_total', 'Median_total')
FreRain <- as.data.frame(table(rainData$rainGroups))
colnames(FreRain)[1] <- "rainGroups"
#Barplot for average money spent with median lines
ggplot(transactions, aes(Rainfall, Mean_total)) + geom_bar(stat="identity", position = "dodge", fill = "#FF6666") + ggtitle("Average money spent (in dollars) per purchase for different rainfall") + ylab("Dollars") + xlab("Rainfall") + theme_bw() + theme(plot.title = element_text(hjust = 0.5)) + geom_errorbar(data=transactions, aes(Rainfall, ymax = Median_total, ymin = Median_total), size=1, linetype = "solid", inherit.aes = F, width = 0.9) + geom_text(aes(label = FreRain$Freq), position = position_dodge(width = 0.9), vjust = 1.5) + scale_x_discrete(labels = c('No rain','small rain', 'heavy rain'))
#Number of transactions for each rainfall range
ggplot(rainData, aes(rainGroups)) + geom_bar() + scale_x_discrete(labels = c('No rain','small rain', 'heavy rain')) + ylab("Number of transactions") + xlab("Rainfall") + theme_bw() + ggtitle("Number of total transaction for different rainfall") + theme(plot.title = element_text(hjust = 0.5))
FreRain
## rainGroups Freq
## 1 [0,0.1] 8274
## 2 (0.1,15] 3200
## 3 (15,70] 322
#Total money spent for each temperature range
totalPerDay = aggregate(rainData$Total ~ rainData$rainGroups, rainData, sum)
nrOfDaysPerRain = aggregate(rainData$Date ~ rainData$rainGroups, rainData, function(x) length(unique(x)))
rainTotals = merge(x = totalPerDay, y = nrOfDaysPerRain, by='rainData$rainGroups')
names(rainTotals) = c('Rainfall', 'Total', 'NrOfDays')
rainTotals['meanPerDay'] = round(rainTotals$Total / rainTotals$NrOfDays, 1)
ggplot(rainTotals, aes(Rainfall, meanPerDay)) + geom_bar(stat="identity", position = "dodge") + scale_x_discrete(labels = c('No rain','small rain', 'heavy rain')) + ylab("Dollars") + xlab("Rainfall") + theme_bw() + ggtitle("Average money spent (dollars) in one day for different temperature") + theme(plot.title = element_text(hjust = 0.5))
hour = as.integer(substr(data$Time, 0, 2))
data$timeGroups = cut(hour, seq(8,20,1))
table(data$timeGroups)
##
## (8,9] (9,10] (10,11] (11,12] (12,13] (13,14] (14,15] (15,16] (16,17]
## 13 523 838 1072 1399 1341 1402 1386 1507
## (17,18] (18,19] (19,20]
## 1618 715 38
#Total money made per time gap
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
timeTemp = data %>%
group_by(timeGroups, tempGroups) %>%
summarize(total = sum(Total) )
names(nrOfDaysPerTemp) = c('tempGroups', 'nrOfDays')
timeTemp = merge(x= timeTemp, y=nrOfDaysPerTemp , on='tempGroups')
timeTemp['scaledTotal'] = timeTemp$total / timeTemp$nrOfDays
ggplot(timeTemp, aes(x = timeGroups, y = scaledTotal)) + geom_point() + geom_line(aes(group=tempGroups))
ggplot(timeTemp, aes(x = timeGroups, y = scaledTotal)) + geom_point() + facet_wrap(~tempGroups)
#ggplot(test, aes(x = timeGroups, y = total)) + geom_point() + geom_line(aes(group=tempGroups))
#average money spent by customer per transaction, note
#testMean = data %>%
# group_by(timeGroups, tempGroups) %>%
# summarize(meanTotal = mean(Total) )
#ggplot(testMean, aes(x = timeGroups, y = meanTotal)) + geom_point() + facet_wrap(~tempGroups)
Insert text and analysis.
Summary:
How does the time of year affect the consumer decision when purchasing alcohol?
Other possible research questions: What time of day do people buy their alcohol?
Insert text and analysis.
Summary:
Insert text.
Style: APA